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Abstract 

Recent results in compressed sensing show that, under certain conditions, the sparsest so- 
lution to an underdetermined set of linear equations can be recovered by solving a linear pro- 
gram. These results either rely on computing sparse eigenvalues of the design matrix or on 
properties of its nullspace. So far, no tractable algorithm is known to test these conditions and 
most current results rely on asymptotic properties of random matrices. Given a matrix A, we 
use semidefinite relaxation techniques to test the nullspace property on A and show on some 
numerical examples that these relaxation bounds can prove perfect recovery of sparse solutions 
with relatively high cardinality. 

Keywords: Compressed sensing, nullspace property, semidefinite programming, restricted 
isometry constant. 

1 Introduction 

A recent stream of results in signal processing have focused on producing explicit conditions under 
which the sparsest solution to an underdetermined linear system can be found by solving a linear 
program. Given a matrix A E R mxn with n > m and a vector v E R m , writing ||x|| = Card(x) 
the number of nonzero coefficients in x, this means that the solution of the following (combinato- 
rial) £ minimization problem: 

minimize ||x||o 
subject to Ax = v, 

in the variable x E R n , can be found by solving the (convex) £ 1 minimization problem: 



(1) 



minimize ||x||i 
subject to Ax = 



(2) 
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in the variable x E R n , which is equivalent to a linear pr ogram- 



Based on result s by IVershik and Sporyshevl (119921) and lAffentranger and Schneided (|1992l) . 
Donoho and Tanned (|2005l) show that when the solution x of (OQ) is sparse with Card(a; ) = k 
and the coefficients of A are i.i.d. Gaussian, then the solution of the l\ problem in ([2]) will always 
match that of the i problem in (OQ) provided k is below an explicitly computable strong recovery 
threshold k s . They also show that if k is below another (larger) weak recovery threshold k w , then 
these solutions match with an exponentially small probability of failure. 

Universal cond i tions for st rong recovery based on sparse extremal eigenvalues were derived 



m 



Candes and Tad (120051) and ICandes and Tad (120061) who also proved that certain (mostly ran- 



dom) matrix classes satisfied these conditions with an exponentially small probability of failu re. 
Simpler, weaker co nditions which can be traced back to lDonoho and Hud(|200l|) . lzhangl(|2005h or 



Cohen et al.l (|2006r) for example, are based on properties of the nullspace of A. In particular, if we 



define 



Otk 



max 

{Ax=0, ||x||] 



=1} {llvll 



max y T x, 
>=i> lly||i<fc} 



these references show that < 1/2 guarantees strong recovery^ 



One key issue with the current sparse recovery conditions in lCandes and Taol (120051) or lDonoho and Huo 



(|200l|) is that except for explicit recovery thresholds available for certain types of random matrices, 
testing these conditions on generic matrices is potentially harder than solving the combinatorial £ - 
norm minimization problem in (OQ) for example as it implies either solving a combinatorial problem 
to compute o^, or c omputing sparse eig e nvalue s. Semidefinite relaxation bounds on sparse eigen- 
values were used in Id' Aspremont et al.l (120081) or Lee and Breslerj (|2008l) for example to test the 
restricted is ometry conditions in ICandes and Tad (|2005l) on arbitrary matrices. In recent indepen- 
dent results. I Juditskv and Nemirovskil (|2008|) provide an alternative proof of some of the results in 
Donoho and Hud ( 200 lb . extend them to the noisy case and produce a linear programming (LP) 



relaxation bound on a k with explicit performance bounds. 

In this paper, we derive a semidefinite relaxation bound on a>k, study its tightness and perfor- 
mance. By randomization, the semidefinite relaxation also produces lower bounds on the objective 
value as a natural by-product of the solution. Overall, our bounds are slightly better than LP ones 
numerically but both relaxations share the same asymptotic performance limits. However, because 
it involves solving a semidefinite program, the complexity of the semidefinite relaxation derived 
here is significantly higher than that of the LP relaxation. 



The pa per is organized as fo llows. In Section[2l we briefly recall some key results in lDonoho and Huo 



(|200l|) and lCohen et al.1 (|2006l) . We derive a semidefinite relaxation bound on a>k in Section[3] and 
study its tightness and performance in Section|4l Section|5]describes a first-order algorithm to solve 
the resulting semidefinite program. Finally, we test the numerical performance of this relaxation 
in Section |6] 



Notation To simplify notation here, for a matrix X £ R mxn 5 we wr ite its columns Xj, ||X||i the 
sum of absolute values of its coefficients (not the t\ norm of its spectrum) and HXH^ the largest 
coefficient magnitude. More classically, \\X\\ F and \\X\\ 2 are the Frobenius and spectral norms. 
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2 Sparse recovery & the null space property 



Given a coding matrix A G R' 
v G R m such that 



with n > m, a sparse signal x G R" and an information vector 



Axr 



we focus on the problem of perfectly recovering the signal x from the vector v, assuming the 
signal x is sparse enough. We define the decoder Ai(v) as a mapping from R m — > R n , with 



Ai(i>) = argmin ||x||i. 

{xeR n : Ax=v} 



(3) 



This particular decoder is equivalent to a linear program which can be solved efficiently. Suppose 
that the original signal xq is sparse, a natural quest ion to ask is then: Wh e n does this decoder 



perfec tly re cover a sparse signa l xq! Recent results by lCandes and Tad(|2005|) . lDonoho and Tanner 



(2005) and ICohen et al.1 (|200q) provide a somewhat tight answer. In particular, as in ICohen et al. 
(|2006|) . for a given coding matrix A G R mxn and k > 0, we can quantify the i\ error of a decoder 
A(i>) by computing the smallest constant C > such that 



for all x G R" , where 



\x-A{Ax)\\ x <Ca k {x) 



(Tk[x) = mm \\x 

{zGR": Card(z)=fc} 



(4) 



* i 



is the £i error of the best A;-term approximation of the signal x and can simply be computed as the 
It norm of the n — k sm all est coefficien t s of x G R™. We now define the nullspace property as in 
DonohoandHuol d200lh or lCohen et all d2006h . 



Definition 1 A matrix A G R mxn satisfies the null space property in l\ of order k with constant Ck 
if and only if 

p||i < Cfc||zrc||i (5) 

holds for all z G R" with Az = and index subsets T C [1 , n] of cardinality Card(T) < k, where 
T c is the complement ofT in [l,n]. 



Cohen et al.1 (|2006h for example show the following theorem linking the optimal decoding quality 
on sparse signals and the nullspace property constant Ck- 

Theorem 2 Given a coding matrix A G R mxn and a sparsity target k > 0. If A has the nullspace 
property in ([3]) of order 2k with constant C/2, then there exists a decoder A which satisfies (0) 
with constant C. Conversely, if® holds with constant C then A has the nullspace property at the 
order 2k with constant C. 



Proof. See (ICohen etalll2006L Corollary 3.3) 



This last result means that the existence of an optimal decoder staisfying © is equivalent to A 
satisfying ©. Unfortunately, this optimal decoder A (i> ) is defined as 



A (v) = argmin a k (z) 

{zGR": Az=v} 
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hence requires solving a combinatorial problem which is potentially intractable. However, using 
tighter restrictions on the nullspace property constant Ck, we get the following result about the 
linear programming decoder Ai(v) in ©. 

Theorem 3 Given a coding matrix A E R mxn and a sparsity target k>0.IfA has the nullspace 
property in © of order k with constant C < 2, then the linear programming decoder Ai(y) in © 
satisfies the error bounds in © with constant 2C /(2 — C) at the order k. 



Proof. See steps (4.3) to (4. 10) in the proof of dCohen et all I2006L Theorem 4.3). 



To summarize the results above, if there exists a C > such that the coding matrix A satisfies 
the nullspace property in © at the order k then there exists a decoder which perfectly recovers 
signals x with cardinality k/2. If, in addition, we can show that C < 2, then the linear program- 
ming based decoder in © perfectly recovers signals x with cardinality k. In the next section, we 
produce upper bounds on the constant C k in © using semidefinite relaxation techniques. 



3 Semidefinite Relaxation 

Given A E R mxn and k > 0, we look for a constant C k > 1 in © such that 

||^t||i < {C k - l)||a>H|i 

for all vectors x E R n with Ax = and index subsets T C [l,n] with cardinality k. We can 
rewrite this inequality 

||zt||i < a k \\x\\i (6) 

with a k E [0, 1). Because a k = 1 — l/C^, if we can show that a k < 1 then we prove that A 
satisfies the nullspace property at order k with constant C k . Furthermore, if we prove a k < 1/2, 
we prove the existence of a linear programming based decoder which perfectly recovers signals x 
with at most k errors. By homogeneity, the constant a k can be computed as 

ak = max max y T x, (7) 

{Ax=o, |M|i=i} {\\y\\oc=i, \\yh<k} 

where the equality ||x||i = 1 can, without loss of generality, be replaced by ||a;||i < 1. We now 
derive a semidefinite relaxation for problem © as follows. After a change of variables 

( X Z T 
\ Z Y 

we can rewrite © as 

maximize Tr(Z) 
subject to AXA T = 0, ||X||i < 1, 

Halloo <1, ll^l|l<^ 2 , ll^l|l<^ (8) 

( X z Z y)^ *^{ X z y)- 1 * 
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in the variables X, Y E S n , Z E R nxn 5 where all norms should be understood componentwise. We 
then simply drop the rank constraint to form a relaxation of © as 



maximize 
subject to 




(9) 



which is a semidefinite program in the variables X, Y E S n , Z E R nxn . Note that the contraint 
|| Z ||i < k is redundant in the rank one problem but not in its relaxation. Because all constraints 
are linear here, dropping the rank constraint is equivalent to computing a Lagrangian (bidual) 
relaxation of the original problem and adding redundant constraints to the original problem often 
tightens these relaxations. The dual of program © can be written 

minimize ||£/i||oo + ^ 2 II ^2 1| 00 + ll^lli + ^I^IU 

( Ux - A T WA -i(I + UA \ ^ n 
subject to { _ W + UI) l + u J)^ 

which is a semidefinite program in the variables Ui, U 2 , t/ 3 , W E S n and U4 E R nxn . For any 
feasible point of this program, the objective ||t/i||oo + A: 2 1 1 1 1 00 + ll^lli + &||£4||oo is an upper 
bound on the optimal value of ©, he nce on a k . We c an further simplify this program using 



elimination results for LMIs. In fact, (IBoyd et all 1 19941 §2.6.2) shows that this last problem is 
equivalent to 

minimize ||t/i||oo + A; 2 ||[/ 2 ||oo + ll^lli + fc||t4||oo 

subject to ( U ^ wATA -W + Vt)\ (10) 
subject to ^ _l ( i + U T ) U2 + U3 

where the variable w is now scalar. In fact, using the same argument, letting P E R nxp be an 
orthogonal basis of the nullspace of A, i.e. such that AP = with P T P = I, we can rewrite the 
previous problem as follows 

minimize Ht/Joo + A; 2 1| t^Hoo + ||[/ 3 ||i + /c||C/ 4 ||oo 

1 ■ / P T U,P -\pT{i + u A ) \ (11) 

subject to [_ W + u t )p \ 2 + Uz )h0, 

which is a (smaller) semidefinite program in the variables U±, U2, U3 E S n and U4 E R nxn . The 
dual of this last problem is then 

maximize Tr(Q^-P) 

subject to ||PQiP T ||i < 1, \\PQl\\i < k 

IIQslloo < 1, IIQ3II1 < k 2 (12) 



Qi Qi 
Q2 Qs 



ho, 
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which is a semidefinite program in the matrix variables Qi G S p , Q 2 G R pxn , Q 3 g S n , whose 
objective value is equal to that of problem ©. 

Note that adding any number of redundant constraints in the original problem © will further 
improve tightness of the semidefinite relaxation, at the cost of increased complexity. In particular, 
we can use the fact that when 

Nli = i, IMIoc = l, \\y\\i < k, 

and if we set Y = yy T and Z = yx T , we must have 

n 

\Yj\ < ktj, \Yij\ < tj, l T t < k, t < 1, for i, j = 1, . . . ,n, 

i=i 

and 



^ \Zij\ < krj, |%| < rj, l T r < k, for i,j = 1, 



n, 



i=l 



for r, t G R n . This means that we can refine the constraint \\Z\lx < A; in © to solve instead 

maximize Tr(Z) 

subject to AXA T = 0, ||X||i < 1, 

Si=i l^il ^ ktj, \Yij\ < tj, l T t < k, t < 1, 

Er=i \ z v\ ^ kr v \ Z ij\ ^ r i' lTr ^ !> for J = !>•••> 



>r 0, 



which is a semidefinite program in the variables X,Y G S n , Z G R nxn and r, i G R™. Adding these 
columnwise constraints on y and Z significantly tightens the relaxation. Any feasible solution to 
the dual of (fT3l) with objective value less than 1/2 will then be a certificate that a k < 1/2. 



4 Tightness & Limits of Performance 

The relaxation ab ove naturally produces a coyarianc e matrix as its output and we use randomization 



techniques as in iGoemans and Williamson! (|1995|) to produce primal solutions for problem © 



Then, following results by A. Nemirovski (private communication), we bound the performance of 
the relaxation in ©. 

4.1 Randomization 

Here, we show that lower bounds on a fc can be generated as a natural by-product of the relaxation. 
We use solutions to the semidefinite program in © and generate feasible points to © by random- 
ization. These can then be used to certify that a k > 1/2 and prove that a matrix does not satisfy 
the nullspace property. Suppose that the matrix 

X Z T \ 

Z Y (14) 
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solves problem ®, because r >z 0, we can generate Gaussian variables (x, y) ~ A/"(0, T). Below, 
we show that after proper scaling, (x, y) will satisfy the constraints of problem © with high 
probability, and use this result to quantify the quality of these randomized solutions. We begin 
by recalling classical results on the moments of ||rr||i and ||a;||c» when x ~ fif(0,X) and bound 
deviations above their means using concentration inequalities on Lipschitz functions of Gaussian 
variables. 



Lemma 1 Let X G S n , x ~ A/"(0, X) and 5 > 0, we have 



\ x \\i > 1 ) < 1 



(15) 



Proof. Let P be the square root of X and Ui ~ W(0, 1) be independent Gaussian variables, we 
have 



F i 



E 

i=l 



3=1 



hence, because each term | Y^j=i Pij u j \ l% a Lipschitz continuous function of the variables u with 

constant (X)j=i -Pjj) 1//2 = ( X^) 1 / 2 , \\x\\-\ is Lipschi tz with co nstant L = Y^2= i (Xu) 1 ^ 2 . Using the 
concentration inequality by llbragimov et al.1 (| 197 61) (see also lMassartl (|2007|) for a general discus- 
sion) we get for any (3 > 

x||i E[||x||i] +t 

> = I < exp 
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with E[||x||i] = \/2AEr=i ( x u) 1/2 - Picking t = ^2\Sg5L and = EfHzy + t yields the 
desired result. ■ 



We now recall another classic result on the concentration of ||y||oo, also based on the fact that 
|y||oo is a Lipschitz continuous function of independent Gaussian variables. 



Lemma 2 Let Y e S n , y ~ A/"(0, Y) and 5 > ?/zen 

IMIoo 



(V21og2ra + V2bgT) max i= i,..., n (y ii ) 1 /2 " X J " I 



1 



(16) 



Proof. (IMassartl I2007L Theorem 3.12) shows that ||y||oo is a Lipschitz function of independent 
Gaussian random variables with constant max !=li ...^(Yu) 1 ^ 2 , hence a reasoning similar to that in 
lemma Q] yields the desired result. ■ 

Using union bounds, the lemmas above show that if we pick 3/5 < 1 and (x, y) ~ A/"(0, V), 
the scaled sample points 

x y 

g(X,5Y h(Y,n,k,5) 



1 



will be feasible in © with probability at least 1 — 3/5 if we set 

n 

g(X,S) = (v^+ ^/2Io^)^(X ^^ ) 1/2 d7) 

i=l 

and 

h(Y, n, k, 6) = max { ( ^2h^ + ^2^5) max {Y^'\ ( ^ + V ^f ) E ^ ^ \ 

i=l,...,n K 

(18) 

The randomization technique is then guaranteed to produce a feasible point of © with objective 
value 

g{X,8)h(Y,n,k,6) 

where g{i_3/«5} is the 1 — 3/5 quantile ofx T y when (x, y) ~ A/"(0, T). We now compute a (relatively 
coarse) lower bound on the value of that quantile. 



Lemma 3 Let e, 5 > 3 and (x, y) ~ A/"(0, T), wzY/z T defined as in rf7?l) . ?/zen 



P J>^>Tr(Z) 



,i=l 



(19) 



where 



a 2 = \\Z\\ 2 F + Tr{XY). 
Proof. Let S e R 2nx2n be such that T = S T S and (x, y) ~ W(0, T), we have 
E \{y T x) 2 ] = El J= ^[(Srw)(S^ +l w)(Sjw)(S^w)] 
where w is a standard normal vector of dimension 2n. Wick's formula implies 

^ii -^-ij ^ij ^ 

E[(5»(^)(5»(^)] = Haf 



7 v 7 \ • 



Xij Zij Xjj Zjj 
\ Yij Zjj Yjj J 

ZiiZjj -\- Z^j -\- XijYij, 



where Haf (X) is the Hafnian of the matrix X (see lBarvinokl (|2007|) for example), which means 

E [(y T x) 2 ] = (Tr(Z)) 2 + ||Z||| + Tr(XY). 

Because E[y T x] = E[Tr(xy T )} = Tr(E[xy T }) = Tr(Z), we then conclude using Cantelli's 
inequality, which gives 



P\Y^x iyi <Tr{Z)-taj < — |- 



having set t = y/3/ \/8 — 3. ■ 

We can now combine these results to produce a lower bound on the objective value achieved 
by randomization. 

Theorem 4 Given A e R mxn , e > and k > 0, writing SDP k the optimal value of^, we have 

SDP k - e 



g(X,5)h(Y,n,k,5) 

where 



<a k < SDP k (20) 



5 o , WIS- + Tr(XY)) 



c 2 



g(X, 6) = (y/2fr + y/2fo£5) £ (X u ) 11 



1/2 
i=l 

and 

hfv u x\ \ ( /oi — /ol — m/2 (v 7 ^ + v^logo") ^=1 ( F «) 1/2 
h[Y, n, k, d) = max < (a/2 log 2n + a/2 logo) max [Ya) ' , — ; — 

i=l,...,n k 

Proof. If T solves © and the vectors (x, y) are sampled according to (x, y) ~ W(0, V), then 

E[(Ar)(Ar) T ] = E[Arx T A T ] = AXA T = 0, 
means that we always have Ax = 0. When 5 > 3, Lemmas Q] and [2] show that 

x y 



g(X,5Y h(Y,n,k,5) 

will be feasible in © with probability at least 1 — 3/5, hence we can get a feasible point for ([7]) 
by sampling enough variables (x, y). Lemma|3] shows that if we set 5 as above, the randomization 
procedure is guaranteed to reach an objective value y T x at least equal to 

Tr(Z) - e 



g(X,5)h(Y,n,k,5) 

which is the desired result. ■ 

Note that because r h 0, we have Zfj < XuYjj, hence \\Z\\ 2 F < Tr(X) Tr{Y) < k 2 . We also 
haveTr(XF) < ||X||i||F||i < fc 2 hence 

6k 2 

5<3 + —. 

e 2 
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and the only a priori unknown terms controlling tightness are Ym=i(Xu) 1 I 2 , Y^^iO^a) 1 ^ and 
max i=li ^(Yjj) 1 / 2 . Unfortunately, while the third term is bounded by one, the first two can become 
quite large, with trivial bounds giving 

n n 

X)(*«) 1/2 <v^ and E(^) 1/2 ^v^, 

i=l i=l 

which means that, in the worst case, our lower bound will be off by a factor 1/n. However, we 
will observe in Section |6] that, when k = 1, these terms are sometimes much lower than what the 
worst-case bounds seem to indicate. The expression for the tightness coefficient 7 in (fT4l) also 
highlights the importance of the constraint ||Z||i < k. Indeed, the positive semidefinitess of 2 x 2 
principal submatrices means that Zfj < XuYjj, hence 

ii^iii < (e<*«) 1/2 ) (|> i)1/2 ) ' 

so controlling || Z || 1 potentially tightens the relaxation. This is confirmed in numerical experiments: 
the relaxation including the (initially) redundant norm constraint on Z is significantly tighter on 
most examples. Finally, note that better lower bounds on a. k can be obtained (numerically) by 
sampling ||a>r||i/IMIi in © directly, or as suggested by one of the referees, solving 

maximize c T x 

subject to Ax = 0, ||x||i < 1, 

in x E R n for various random vectors c e { — 1, 0, l} n with at most k nonzero coefficients. In both 
cases unfortunately, the moments cannot be computed explicitly so studying performance is much 
harder. 



4.2 Performance 

Following results by A. Nemirovski (private communication), we can derive precise bounds on the 
performance of the relaxation in ©. 

Lemma 4 Suppose (X, Y, Z) solve the semidefinite program in ([9]), then 

Tr(Z) = «i 

and the relaxation is tight for k — 1. 

Proof. First, notice that when the matrices (X, Y, Z) solve ©, AX = with 



X Z T 
Z Y 



y 



means that the rows of Z also belong to the nullspace of A. If A satisfies the nullspace property 
in ©, we must have \Za\ < a\ YTj=i f° r * = 1, • • • , hence Tr(Z) < ai||Z||i < a±. By 
construction, we always have Tr(Z) > ol\ hence Tr(Z) = ol\ when Z solves © with k = 1. m 
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As in Juditsky and Nemirovskil ( 2008b . this also means that if a matrix A satisfies the restricted 
isometry property at cardinality 0(m) (as Gaussian matrices do for example), then the relaxation 
in © will certify < 1/2 for k = 0(\/m). Unfortunately, the results that follow show that this 
is the best we can hope for here. 

Without loss of generality, we can assume that n = 2m (if n > 2m, the problem is harder). Let 
Q be an orthoprojector on a (n— m) -dimensional subspace of the nullspace of A, with Rank(Q) = 
n — m = m. By construction, ||Q||i < ra||Q|| 2 = riy/n,0 ^ Q ^ I and of course AQ = 0. We can 
use this matrix to construct a feasible solution to problem (fT3l) when k = y/m. We set X = Q/y/n, 

, n. We then have 



n, 



Y = Q/(n\/m), Z = Q/n, tj = l/(ny/m) and rj = 1/n for j = 1, 



II; 



i 1 



IIQilli < \\Qih < 



< kti 



and \\YiWoo < ||li|| 2 < l/(ny/m) with l T t < k. We also get 



\Zi i 



11*11. < Mi < kri . 



With 



n 
n 



1/2 

-i 



ii 



n 

rT x m 



l,...,n. 



-1/2 



^ 0, 



the matrices we have defined above form a feasible point of problem (TT3T) . Because, Tr(Z) = 
Tr(<5)/n = 1/2, this feasible point proves that the optimal value of (fT3l) is larger than 1/2 when 
n = 2m and k = y/m. This means that the relaxation in (TT3T) can prove that a matrix satisfies the 
nullspace property for cardinalities at most k = 0(y/m) and this performance bound is tight since 
we have shown that it achieves this rate of 0(y/m) for good matrices. 

This counter example also produces bounds on the performance of another relaxation for testing 
sparse recovery. In fact, if we set X = Q/m with Q defined as above, we have Tr(X) = 1 with 
X y and 

IIQIIi 



x 



and X is an optimal solution of the problem 



m 



< 2yfm 



minimize Tr(XAA T ) 
subject to ||X||i < 2\/2m 

Tr(X) = 1, X y 0, 



which is a semidefinite relaxation used in ld'Aspremont et al. I J2007h and ld'Aspremont et all d2008h 
to bound the restricted isometry constant 6k(A). Because Tr(XAA T ) = by construction, we 
know that this last relaxation will fail to show Sk(A) < 1 whenever k = 0{\/m). Somewhat 
str ikingly, this means t hat the three different tractable te s ts for sparse recovery conditions, derived 



in 



d'Aspremont et al. ( 2008 ). Juditsky and Nemirovskil ( 2008 ) and this paper, are all limited to 
showing recovery at the (suboptimal) rate k = 0(\/m). 



11 



5 Algorithms 



Small inst ances of the s emidefinite program in (fTTI) and be solved efficiently using solvers such as 
SEDUMI (|SturmLll999r) or SDPT3 (|Toh et all 1 19990 . For larger instances, it is more advantageous 
to solve (fTTj) using first order techniques, given a fixed target for a. We set P £ R nx P to be an 
orthogonal basis of the nullspace of the matrix A in ©, i.e. such that AP = with P T P = I. 
We also let a be a target critical value for a (such as 1/2 for example), and solve the following 
problem 

P T XJ X P -±P r (I + f/ 4 )\ 
-\{l + Vj)P U 2 + U 3 J (21) 

-fc 2 ||^ 2 ||oo+ Ml + fclMoo <« 



maximize A, 



subject to \\Ui\ 



in the variables Ui, U 2 , U 3 £ S n and U4 £ R nxn . If the objective value of this last problem is 
greater than zero, then the optimal value of problem (fTTI) is necessarily smaller than a, hence 
a < a in ©. 

Because this problem is a minimum eigenvalue maximization problem over a simple compact 
(a norm ball in fact), large-scale instances can be solv ed efficiently using projected gradient algo - 
rithms or smooth semidefinite optimization techniques (|NesterovL 120071 : Id ' Aspremont et al 1 12007|) . 
As we show below, the complexity of projecting on this ball is quite low. 



Lemma 5 The complexity of projecting (x , Vo, ^o, wq) £ R on 

(ML + A: 2 ||?/||oo + IMIi + ^Iklloo < a 
is bounded by 0(n log n log 2 (l / e)), where e is the target precision in projecting. 
Proof. By duality, solving 



minimize ||x — xq\\ + \\y — yo\\ + \\z — z \ 
subject to \\x\loo + fc 2 !!!/!!^ + \\z\\i + fc||w||oc 



+ \\w — Wq\ 

< a 



in the variables x, y, z £ R n is equivalent to solving 

| 2 + Alblloo + AA^Hylloo + A||z||i + Xk\\w\ 



max mm \\{x,y, z,w) 

A>0 x,y,z,w 



{x ,yo : ZoiW 



in the variable A > 0. For a fixed A, we can get the derivative w.r.t. A by solving four separate 
penalized least-squares problems. Each of these problems can be solved explicitly in at most 
0(n\ogn) (by shrinking the current point) so the complexity of solving the outer maximization 
problem up to a precision e > by binary search is 0(n log nlog 2 (l/e)) ■ 



We can then implement the smooth minimiz ation algo r ithm d et ailed in dN estero vl. I2005L ^ 5.3) 
to a smooth approximation of problem (f2Tb as in lNesterovl (|2007|) or Id' Aspremont et al.l (|2007|) for 
example. Let p > be a regularization parameter. The function 



/ M (X) = /ilog Trexp 



(22) 
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satifies 

A m axPO < U(X) < A max (X) +/ilogn 

for any X E S n . Furthermore, fn(X) is a smooth approximation of the function A max (X), and 
Vfn(X) is Lipschitz continuous with constant log n/p. Let e > be a given target precision, this 
means that if we set fi = ej (2 logn) then 



f(u) = -U 



-P T U X P 



ul)P 



\p t {i 

-{U 2 - 



U 3 ) 



will be an e/2 approximation of the objective function in (I2TT) . Whenever 
have 



where U = (U lt U 2 ,U 3 ,U 4 ), (23) 
F < 1, we must 



-P T U 1 P P T U i /2 
UfP/2 -{U 2 + U 3 



< \\P T UiP 



\Uo 



Us\\l 



\P U 4 \\l < 4 



hence, following ( Nesterovl 20071 §4), the gradient of f(U) is Lipschitz continuous with respect 
to the Frobenius norm, with Lipschitz constant given by 

8 log(n + p) 



We then define the compact, convex set Q as 

q = {(fA, u 2 , u 3 , u 4 ) g : n^iu + tfWUzWn 



\U 3 \\l + &||Kl||oc < ot) 



and define a prox function d(U) o\erQasd(U) = \\U\\ F /2, which is strongly convex wi t h cons tant 
a = 1 w.r.t. the Frobenius norm. Starting from Uq = 0, the algorithm in iNesterovl (|2005l) for 
solving 

maximize f(U) 
subject to U E Q, 

where f(U) is defined in d23l . proceeds as follows. 
Repeat: 

1. Compute f(Uj) and Vf(Uj) 

2. FindF, = argmin yeQ (V/(^),F) + \L\\Ui - Yf F 



E 



j+i 

j=0 2 



3. Find^=argmin WeQ {™ 

4. Set U j+1 = fyW) + igY, 
Until gap < e. 

Step one above computes the (smooth) function value and gradient. The second step computes 
the gr adient mapping, which matches the gradient step for unconstrain ed problems (see (|NesterovL 



2003L p.86)). Step three and four update an estimate sequence see (|NesterovL I2003L p.72) of / 



whose minimum can be computed explicitly and gives an increasingly tight upper bound on the 
minimum of /. We now present these steps in detail for our problem. 
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Step 1 The most expensive step in the algorithm is the first, the computation o f f and its gradient. 
This a mounts to computing the matrix exponential in (1221) at a cost of 0(n 3 ) (see lMoler and Van Loan 
d2003h for details). 



Step 2 This step involves solving a problem of the form 

argmin (Vf(U),Y) + ^L\\U - Y\\%, 

Y&Q * 

where U is given. The above problem can be reduced to an Euclidean projection on Q 

argmin ||y - V\\ F , (24) 

\\Y\\eQ 

where V = U+L~ l V f^(U) is given. According to Lemma[5l this can be solved 0(n log nlog 2 (l/e)) 
opearations. 



Step 3 The third step involves solving an Euclidean projection problem similar to (l24l) . with V 
defined here by: 

3=0 



Stopping criterion We stop the algorithm when the duality gap is smaller than the target preci- 
sion e. The dual of the binary optimization problem (|2T)) can be written 

minimize a max{||PG 11 P T || 1 , \\G 22 \U ^%^} - Tr(PG 12 ) 

subject to Tr(G) = l,GhO, K ' 

in the block matrix variable G E S n+P with blocks Gij, i,j = 1,2. Since the gradient V/(P) 
produces a dual feasible point by construction, we can use it to compute a dual objective value and 
bound the duality gap at the current point U. 



Complexity According to iNesterovl (12007b . the total worst-case complexity to solve (I2TI) with 
absolute accuracy less than e is then given by 



O 



n \/\ogn 



and 
Note that 



Each iteration of the algorithm requires computing a matrix exponential at a cost of 0(n 3 
the algorithm requires 0(nyJ\og n/e) iterations to reach a target precision of e > 0. 
while this smooth optimization method can be used to produce reasonable complexity bounds for 
checking if the optimal value of (I2TI) is positive, i.e. if a k < a, in practice the algorithm is relatively 
slow and we mostly use interior point solvers on smaller problems to conduct experiments in the 
next section. 
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6 Numerical Results 



In this section, we illustrate the numerical performance of the semidefinite relaxation detailed in 
section [3] 



6.1 Illustration 

We test the semidefinite relaxation in (PTTT) on a sample of ten random Gaussian matrices A E RP xn 
with Aij ~ jV(0, 1/ y/p), n = 30 and p = 22. For each of these matrices, we solve problem (fTTI) 

for fc = 2 5 to prod uce upper bounds on hence on C\, in ©, with a fc = 1 — 1/Cfc. From 

Donoho and Huol ( 200l|) . we know that if a>k < 1 then we can bound the decoding error in @), and 
if ctk < 1/2 then the original signal can be recovered exactly by solving a linear program. We also 
plot the randomized values for y T x with k = 1 together with the semidefinite relaxation bound. 
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Figure 1: Bounds on a*,-. Left: Upper bounds on au obtained by solving (fTTT) for various 
values of k. Median bound over ten samples (solid line), dotted lines at pointwise minimum 
and maximum. Right: Lower bound on ol\ obtained by randomization (red dotted line) 
compared with semidefinite relaxation bound (SDP dashed line). 



Next, in Figure [2l we use a Gaussian matrix A E R pxn with A^ ~ A/"(0, l/ v /p), n = 36 
and p = 27 and, for each fc, we sample fifty information vectors v = Axq where xq is uniformly 
distributed and has cardinality k. On the left, we plot the probability of recovering the original 
sparse signal a;o using the linear programming decoder in ©. On the right, we plot the mean i\ 
recovery error ||x — xq || i using the linear programming decoder in © and compare it with the 
bound induced by Theorem [3] 
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Figure 2: Sparse Recovery. Left: Empirical probability of recovering the original sparse 
signal using the LP decoder in OJ). The dashed line is at the strong recovery threshold. 
Right: Empirical mean l\ recovery error \x— a?o||i using the LP decoder (circles) compared 
with the bound induced by Theorem[3](squares). 



6.2 Performance on compressed sensing matrices 

In tables [Q [2] and [31 we compare the performa nce of the linear programming relaxation bound 
on a fe derived in ljuditsky and Nemirovskil (12008b with that of the semide finite programming bound 
detailed in Section [3] We test these bounds for various matrix shape ratios p = m/n, ta rget cardi 



nalities k on matrices with Fourier, Bernoulli or Gaussian coefficients using SDPT3 by iToh et al. 



(| 19991) to solve problem (fTTj) . We show median bounds computed over ten sample matrices for 
each type, hence test a total of 600 different matrices. We compar e these relaxation bounds with the 
upper bounds produced by sequential convex optimization as in ljuditsky and Nemirovskil (|2008L 
§4.1). In the Gaussian case, we also compare these relaxation bo unds with the asymptotic th resh- 
olds on strong and weak (high probability) recovery discussed in lDonoho and Tannei ( 2008). The 
semidefinite bounds on a fc always match with the LP bounds in ljuditsky and Nemirovskil (|2008[) 
when k — 1 (both are tight), and are often smaller than LP bounds whenever k is greater than 1 on 
Gaussian or Bernoulli matrices. The semidefinite upper bound on a fc was smaller than the LP one 
in 563 out of the 600 matrices sampled here, with the difference ranging from 4e-2 to -9e-4. Of 
course, this semidefinite relaxation is significantly more expensive than the LP based one and that 
these experiments thus had to be performed on very small matrices. 



6.3 Tightness 

Section [4] shows that the tightness of the semidefinite relaxation is explicitly controlled by the 
following quantity 

A* = 9(X,5)h(Y,n,h,5), 
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Relaxation 


p 




«2 


a 3 


«4 


a 5 


Upper bound 


LP 


0.5 


0.21 


0.38 


0.57 


0.82 


0.98 


2 


SDP 


0.5 


0.21 


0.38 


0.57 


0.82 


0.98 


2 


SDP low. 


0.5 


0.05 


0.10 


0.16 


0.24 


0.32 


2 


LP 


0.6 


0.16 


0.31 


0.46 


0.61 


0.82 


3 


SDP 


0.6 


0.16 


0.31 


0.46 


0.61 


0.82 


3 


SDP low. 


0.6 


0.04 


0.09 


0.15 


0.20 


0.31 


3 


LP 


0.7 


0.12 


0.25 


0.39 


0.50 


0.62 


4 


SDP 


0.7 


0.12 


0.25 


0.39 


0.50 


0.62 


4 


SDP low. 


0.7 


0.04 


0.09 


0.14 


0.18 


0.22 


4 


LP 


0.8 


0.10 


0.20 


0.30 


0.38 


0.48 


6 


SDP 


0.8 


0.10 


0.20 


0.30 


0.38 


0.48 


6 


SDP low. 


0.8 


0.04 


0.07 


0.13 


0.17 


0.23 


6 



Table 1: Given ten sample Fourier matrices of leading dimension n = 40, we list median 
upper bounds on the values of for various cardinaliti es k and matrix shape ratios p , 
computed using the linear programming (LP) relaxation in ljuditsky and Nemirovskil (|2008l ) 
and the semidefmite relaxation (SDP) detailed in this paper. We also list the upper bound on 
strong recovery computed using sequential convex optimization and the lower bound on otk 
obtained by randomization using the SDP solution (SDP low.). Values of below 1/2, for 
which strong recovery is certified, are highlighted in bold. 



where g and h are defined in (flTl) and (fT8l) respectively. In Figure[3l we plot the histogram of values 
of \i for all 600 sample matrices computed above, and plot the same histogram on a subset of these 
results where the target cardinality k was set to 1. We observe that while the relaxation performed 
quite well on most of these examples, the randomization bound on performance often gets very 
large whenever k > 1. This can probably be explained by the fact that we only control the mean in 
Lemma [3j not the quantile. We also notice that ji is highly concentrated when k = 1 on Gaussian 
and Bernoulli matrices (where the results in Tables [2] and [3] are tight), while the performance is 
markedly worse for Fourier matrices. 

Finally, Tables [2] and [3] show that lower bounds on a>i obtained by randomization for Gaussian 
are always tight (the solution of the SDP was very close to rank one), while performance on higher 
values of k and Fourier matrices is much worse. On 6 of these experiments however, the SDP 
randomization lower bound was higher than 1/2, which proved that a 5 > 1/2, hence that the 
matrix did not satisfy the nullspace property at order 5. 



6.4 Numerical complexity 

We implemented the algorithm of Section © in MATLAB and tested it on random matrices. 
While the code handles matrices with n = 500, it is still conside rably slower than similar first- 
order algorithms applied to sparse PCA problems for example (see ld'Aspremont et al.1 (|2007f) ). A 
possible explanation for this gap in performance is perhaps that the DSPCA semidefmite relaxation 
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Relaxation 


p 


a\ 


Ct2 


03 


«4 


Q!5 


Strong k 


Weak k 


LP 


0.5 


0.27 


0.49 


0.67 


0.83 


0.97 


2 


11 


SDP 


0.5 


0.27 


0.49 


0.65 


0.81 


0.94 


2 


11 


SDP low. 


0.5 


0.27 


0.31 


0.33 


0.32 


0.35 


2 


11 


LP 


0.6 


0.22 


0.41 


0.57 


0.72 


0.84 


2 


12 


SDP 


0.6 


0.22 


0.41 


0.56 


0.70 


0.82 


2 


12 


SDP low. 


0.6 


0.22 


0.29 


0.31 


0.32 


0.36 


2 


12 


LP 


0.7 


0.20 


0.34 


0.47 


0.60 


0.71 


3 


14 


SDP 


0.7 


0.20 


0.34 


0.46 


0.59 


0.70 


3 


14 


SDP low. 


0.7 


0.20 


0.27 


0.31 


0.35 


0.38 


3 


14 


LP 


0.8 


0.15 


0.26 


0.37 


0.48 


0.58 


3 


16 


SDP 


0.8 


0.15 


0.26 


0.37 


0.48 


0.58 


3 


16 


SDP low. 


0.8 


0.15 


0.23 


0.28 


0.33 


0.38 


3 
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Table 2: Given ten sample Gaussian matrices of leading dimension n = 40, we list median 
upper bounds on the values of for various cardinaliti es k and matrix shape ratios p , 
computed using the linear programming (LP) relaxation in ljuditsky and Nemirovskil (120081) 
and the semidefinite relaxation (SDP) detailed in this paper. We also list the asymptotic 
upper bound on both strong and weak recovery computed in iDonoho and Tanner! (12008b 
and the lower bound on ctfe obtained by randomization using the SDP solution (SDP low.). 
Values of below 1/2, for which strong recovery is certified, are highlighted in bold. 



Relaxation 


P 




«2 


«3 


«4 


a 5 


Upper bound 


LP 


0.5 


0.25 


0.45 


0.64 


0.82 


0.97 


2 


SDP 


0.5 


0.25 


0.45 


0.63 


0.80 


0.94 


2 


SDP low. 


0.5 


0.25 


0.28 


0.29 


0.29 


0.34 


2 


LP 


0.6 


0.21 


0.38 


0.55 


0.69 


0.83 


3 


SDP 


0.6 


0.21 


0.38 


0.54 


0.68 


0.81 


3 


SDP low. 


0.6 


0.21 


0.26 


0.29 


0.33 


0.34 


3 


LP 


0.7 


0.17 


0.32 


0.46 


0.58 


0.70 


4 


SDP 


0.7 


0.17 


0.32 


0.46 


0.58 


0.69 


4 


SDP low. 


0.7 


0.17 


0.24 


0.29 


0.33 


0.37 


4 


LP 


0.8 


0.14 


0.26 


0.38 


0.47 


0.57 


5 


SDP 


0.8 


0.14 


0.26 


0.37 


0.47 


0.57 


5 


SDP low. 


0.8 


0.14 


0.21 


0.27 


0.33 


0.38 


5 



Table 3: Given ten sample Bernoulli matrices of leading dimension n = 40, we list median 
upper bounds on the values of for various cardinaliti es k and matrix shape ratios p , 
computed using the linear programming (LP) relaxation in ljuditsky and Nemirovskil (I2008h 
and the semidefinite relaxation (SDP) detailed in this paper. We also list the upper bound on 
strong recovery computed using sequential convex optimization and the lower bound on a*, 
obtained by randomization using the SDP solution (SDP low.). Values of below 1/2, for 
which strong recovery is certified, are highlighted in bold. 
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Figure 3: Tightness. Left: Histogram of fi = g(X, 5)h{Y, n, k, 5) defined in (fT71) and (fT8T ). 
computed for all sample solution matrices in the experiments above when k > 1. Right: 
Idem using only examples where the target cardinality is k = 1, for Gaussian and Bernoulli 
matrices (light grey) or Fourier matrices (dark grey). 



n 


50 


100 


200 


500 


CPU time 


00 h 01 m 


00 h 10 m 


01 h 38 m 


37 h 22 m 



Table 4: CPU time to show a\ < 1/2, using the algorithm of Section [5] on Gaussian 
matrices with shape ratio p = .7 for various values of n. 



is always tight (in practice at least) hence iterates near the solution tend to be very close to rank one. 
This is not the case here as the matrix in © is very rarely rank one and the number of significant 
eigenvalues has a direct impact on actual convergence speed. To illustrate this point, Figure |4] 
shows a Scree plot of the optimal solution to © for a small Gaussian matrix (obtained by IP 
methods with a target precision of 10~ 8 ), while Table 0] shows, as a benchmark, total CPU time for 
proving that a x < 1/2 on Gaussian matrices, for various values of n. We set the accuracy le — 2 
and stop the code whenever positive objective values are reached. Unfortunately, performance 
for larger values of k is typically much worse (which is why we used IP methods to run most 
experiments in this section) and in many cases, convergence is hard to track as the dual objective 
values computed using the gradient in ( 1231 ) produces a relatively coarse gap bounds as illustrated 
in Figure|4]for a small Gaussian matrix. 
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Figure 4: Complexity. Left: Primal and dual bounds on the optimal solution (computed 
using interior point methods) using the algorithm of Section |5]on a small Gaussian matrix. 
Right: Scree plot of the optimal solution to © for a small Gaussian matrix (obtained by 
interior point methods with a target precision of 10~ 8 ). 



7 Conclusion & Directions for Further Research 



We have detailed a semidefi nite relaxation for the p rob lem of testing if a matrix satisfies the 
nullspace property defined in iDonoho and Hud (I200ll) or ICohen et al.l (|2006|). This relaxation is 
tight fo r k = 1 and matches (numerically) the linear programming relaxation in ljuditsky and Nemirovski 



(120081) . It is often slightly tighter (again numerically) for larger values of k. We can also remark 
that the matrix A only appears in the relaxation (flOl) in "kernel" format A T A, where the constraints 
are linear in the kernel matrix A T A. This means that this relaxation might allow sparse experiment 
design problems to be solved, while maintaining convexity. 

Of course, these small scale experiments do not really shed light on the actual performance 
of both relaxations on larger, more realistic problems. In particular, applications in imaging and 
signal processing would require solving problems where bo th n and k are several orders of mag - 
nitude larger than the values considered in this paper or in ljuditsky and Nemirovskil (|2008l) and 
the question of finding tractable relaxations or algorithms that can handle such problem sizes re- 
mains open. Finally, the t hree different tractable te s ts for sparse recovery conditions, derived in 



d'Aspremont et al.l (|2QQ8h . I Juditsky and Nemirovskil (120081) and this paper, are all limited to show- 
ing recovery at the (suboptimal) rate k = 0(y/m). Finding tractable test for sparse recovery at 
cardinalities k closer to the optimal rate 0(m) also remains an open problem. 
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